Fast Lexicon-Based Word Recognition in Noisy Index Card Images

نویسندگان

  • Simon M. Lucas
  • Gregory Patoulas
  • Andy C. Downton
چکیده

This paper describes a complete system for reading typewritten lexicon words in noisy images in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک روش دو مرحلهای برای بازشناسی کلمات دستنوشته فارسی به کمک بلوکبندی تطبیقی گرادیان تصویر

This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the featur...

متن کامل

Emergency medicine, disease surveillance, and informatics

Traditional handwriting recognition algorithms rely heavily on small lexicons and clean word images. Unfortunately, emergency medical documents do not satisfy either of these conditions. This paper describes a strategy whereby given an image representing a noisy handwritten word from a medical document, and a large lexicon consisting of English, medical and pharmacological words, symbols, abbre...

متن کامل

Fast Identification of Stop Words for Font Learning and Keyword Spotting

A recently proposed adaptive strategy for text recognition uses a linguistic fact that over half of the words on a typical English page are among 150 common stop words. The small lexicon permits word-shape based recognition that yields word identities from which character prototypes can be extracted. This paper describes a fast procedure for locating the best candidates for those stop words. Th...

متن کامل

A word shape analysis approach to lexicon based word recognition

Ho, T.K., J.J. Hull and S.N. Srihari, A word shape analysis approach to lexicon based word recognition, Pattern Recognition Letters 13 (1992) 821-826. A method for word recognition is presented that is based on an analysis of the shape of a word as a whole object. It is demonstrated to be a useful alternative for recognizing degraded word images that are prone to errors in character segmentation.

متن کامل

A Computational Model for Recognition of Multi-font Word Images

A computational model for the recognition of multi-font machine-printed word images of highly variable quality is given. The model integrates three word recognition algorithms, each of which utilizes a diierent form of shape and context information. The approaches are character recognition based, segmentation based, and word-shape analysis based. The model overcomes limitations of previous solu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003